METHOD OF RENAMING REGISTERS IN REGISTER FILE 
AND MICROPROCESSOR THEREOF 

Background of the Invention 

1. Field of the Invention 

The present invention relates to functional units and registers to process 
data in a microprocessor, and more particularly, to a microprocessor with clusters 
and register files which are associated with each other to enhance the efficiency of 
data process therein. 

2. Description of the Related Art 

A microprocessor in an electronic system generally contains multiple 
functional units and multiple registers for the use of data process therein. Each 
functional unit executes instructions to write data into pertinent register(s) in a 
register file. Functional units may be any data computation units such as an 
arithmetic logic unit (ALU), an adder unit, a floating point unit, a load store unit, 
etc. 

Since functional units in a microprocessor dispatch data to a register file in 
the same cycle, a register file should have the same number of write ports as that 
of the functional units to satisfy the "peak data write requirement", in which all the 
functional units generate data to be written into a register file in the same cycle. 
Thus, as the number of functional units in a microprocessor is increased, the 
number of write ports of a register file should be increased to satisfy the peak data 
write requirement. 
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Increase in the number of ports in a register file causes increase in the area 
required to implement the register file and also in the time required to access data 
in the register file. For example, in a data write mode, the number of write ports in 
a register file determines the number of data values (or, the amount of data) that 
can be simultaneously written into the register file. 

Referring to Fig. 1 , there is provided a block diagram illustrating a register 
file and functional units in a typical microprocessor. The microprocessor 10 may 
have "n" functional units FUi-FU n each of which can simultaneously produce data 
every cycle. In this case, to satisfy the peak data write requirement, the 
microprocessor 10 should have a register file 12 with the same number of write 
ports WPi-WPn as that of the functional units FUi-FU n , i.e., "n" write ports. 

In case that it is required for a microprocessor to have more functional 
units, it is also required to increase the number of write ports of a register file in the 
microprocessor. Such an increase in the number of write ports affects size and 
speed of the microprocessor. 

To overcome such problems in the conventional microprocessors, a register 
file in a microprocessor is designed to have fewer number of write ports than the 
number of functional units. In such processors, it is necessary to arbitrate the 
functional units for the write ports of the register file. In other words, an arbitration 
unit is required to manage data communication between the functional units and 
the write ports of a register file. 

In an arbitration process, a functional unit should first send a request signal 
to an arbitration unit to write data into a register file. The arbitration unit receives 
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all request signals from functional units and then grants certain functional units 
access to the write ports in accordance with an arbitration logic. Then, the 
functional units of which requests have been granted may proceed to write data 
into a register file, and other functional units of which requests have not been 
5 granted should request the access in the next cycle. 

In a microprocessor adopting the arbitration technique, since each 
functional unit should send an access request and wait for the grant, it causes 
additional delay in data process of the microprocessor. For example, a cycle time 

t 

Q for the microprocessor may be increased by a time period required for the 
00 10 arbitration process. Also, the arbitration process may affect performance of the 
W microprocessor by forcing the functional units stall if there is no write port free. 
w Another example of a conventional approach in this area can be found in 

|J "The Multi-cluster Architecture: Reducing Cycle Time Through Partitioning" by K. I. 
% Frakas et al., pp. 149-159, MICRO-30, Dec. 1997. In this reference, architected 

,:::SPj 
'".'{$& 

ffj 15 registers are partitioned for the purpose of decoupling clusters and reducing read 
and write ports of a register file. In this technique, data read and write operation 
can be performed only between particular register files and functional units 
associated with each other. This technique is described below with reference to 
Fig. 2. 

20 In Fig. 2, the first and second functional units FUi, FU 2 are associated with 

the first and second register files RFi, RF 2 , respectively. The first register file RFi 
has architected registers r 0 -ri 5 , and the second register file RF 2 has architected 
registers ri 6 -r 3 i. The first functional unit FUi has efficient access to the architected 
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registers r 0 -ri 5 in the first register file RFi, and the second functional unit FU 2 has 
efficient access to the architected registers ri 6 -r 3 i in the second register file RF 2 . 
For example, the efficient access may be accomplished when instruction "r 7 <- rn 
+ ri 2 " is dispatched to the first functional unit FUi, and instruction "n? <- r 23 + r 3 i" is 
5 dispatched to the second functional unit FU 2 . 

However, this technique has drawbacks in case of instructions such as 
instruction "r 7 <- rn + r 3 i" which is dispatched to the first functional unit FUi. In this 
case, to obtain the contents of the architected register r 3 i, the first functional unit 
FUi should have access to the second register file RF 2 . The access path between 
J* 10 the first functional unit FUi and the second register file RF 2 is so slow that 
% performance of the microprocessor may be severely retarded. 
^ Another problem in the microprocessor in Fig. 2 is that computation of the 

ill microprocessor may be distributed unevenly. In other words, if the program being 
5 executed in the microprocessor uses mostly architected registers r 0 -ri 5 of the first 
f V 15 register file RFi, the computation for the program is not evenly distributed and the 
registers r-i 6 -r 3 i in the second register file RF 2 are not utilized. 

Therefore, a need exists for a microprocessor having less number of write 
ports in a register file than the number of functional units, while having no 
problems such as performance delay or degradation caused by the arbitration 
20 process, data access through the slow paths, the uneven distribution of 
computation, etc. 
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Objects and Summary of the Invention 

It is an object of the present invention to provide a microprocessor having 
less number of write ports in a register file than the number of functional units in 
the microprocessor. 
5 It is another object of the present invention to provide a method of 

designing a microprocessor with register files and functional units which satisfy the 
"peak data write requirement", while the register files have less number of write 
ports than the number of functional units. 
U To accomplish the above and other objects of the present invention, there is 

'was? 

0 10 provided a microprocessor for processing instructions, comprising a plurality of 
if clusters for receiving the instructions, each of the clusters having a plurality of 
% functional units for executing the instructions; and a plurality of register sub-files 
m each having a plurality of registers for storing data for executing the instructions, 
h wherein each of the clusters is associated with corresponding one of the register 

d 15 sub-files so that an instruction dispatched to a cluster is executed by accessing 

m 

registers in a register sub-file associated with the cluster to which the instruction is 
dispatched. Each of the register sub-files preferably has one write port to which a 
corresponding cluster sends data to be written into registers in a register sub-file 
associated with the corresponding cluster, and the register sub-files each have a 
20 same number of registers. 

The microprocessor may also include a register-renaming unit for renaming 
target registers in an instruction with registers in a register sub-file associated with 
a cluster to which the instruction is dispatched. The register-renaming unit 
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identifies a register to be used to store a value named by a target register in the 
instruction. The microprocessor may also include issue-queue units each of which 
is associated with a corresponding one of the clusters and an instruction dispatch 
mechanism for determining which of the clusters each instruction is dispatched to. 
5 An issue-queue unit holds instruction renamed by the register-renaming unit until 
the renamed instruction is issued to be executed in a cluster associated with the 
issue-queue unit, and the instruction dispatch mechanism controls the 
issue-queue units to determine which of the instructions need to be executed. 
□ In another aspect of the present invention, a system is provided for 

OS 10 processing an instruction in a microprocessor. The system comprises at least one 

'si 

0? cluster having at least one functional unit for executing the instruction; and at least 
C; one register file having a predetermined number of physical registers to and from 
H which data is write and read in accordance with the instruction, wherein the at 
S least one register file has one write port to which an output of the at least one 
jtj 15 cluster is connected, and data write operation in accordance with the instruction 

executed by the at least one functional unit is performed by accessing the physical 

registers of the at least one register file. 

The system may also include means for renaming architected registers of 

the instruction with the physical registers of the at least one register file, and at 
20 least one issue-queue unit associated with the at least one cluster, for holding 

instruction renamed by the means for renaming until the instruction is issued to be 

executed in the at least one cluster. 
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In another aspect of the present invention, a method is provided for 
processing instructions in a microprocessor. The method comprises the steps of 
providing clusters each having functional units for executing the instructions; 
dividing a register file into a plurality of register sub-files each having registers to 

5 store data for executing the instructions; associating each of the register sub-files 
with corresponding one of the clusters; selecting a cluster to which an instruction is 
dispatched; renaming target registers in the instruction with registers in a register 
sub-file associated with the selected cluster; and dispatching the instruction to the 
selected cluster wherein the instruction is executed by functional units. The 

10 dividing step may also include assigning a same number of registers to each of the 
register sub-files. The associating step may include providing one write port for 
each of the register sub-files so that a cluster associated with a register sub-file 
sends data to be written to a write port of the register sub-file. The renaming step 
may include identifying a register in a register sub-file to be used to store value 

15 named by a target register in the instruction. 

These and other objects, features and advantages of the present invention 
will become apparent from the following detailed description of illustrative 
embodiments thereof, which is to be read in connection with the accompanying 
drawings. 
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Brief Description of the Drawings 

Fig. 1 is a block diagram illustrating a register file and functional units in a 
conventional microprocessor; 

Fig. 2 is a block diagram illustrating register files and functional units in 
5 another conventional microprocessor; 

Fig. 3 is a block diagram illustrating register sub-files and clusters in a 
microprocessor according to a preferred embodiment of the present invention; 
Fig. 4 is a block diagram for illustrating a microprocessor according to 
q another embodiment of the present invention; and 

5 io Fig. 5 is a flow chart for describing operation of the microprocessor in Fig. 

i 4. 

.2*5. 

H Description of Preferred Embodiments 

J Detailed illustrative embodiments of the present invention are disclosed 

ry 15 herein. However, specific structural and functional details disclosed herein are 
merely representative for purposes of describing preferred embodiments of the 
present invention. 

Referring to Fig. 3, a block diagram is provided for illustrating a 
microprocessor according to a preferred embodiment of the present invention. In 
20 the microprocessor 30, a register file is divided into multiple register sub-files 
RSFo-RSFn. Each of the register sub-files RSF 0 -RSF n includes a set of physical 
registers (refer to Fig. 4). Preferably, the register sub-files RSF 0 -RSF n each have 
the same size, i.e., the same number of physical registers, and have one write port 
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WPo-WPn, respectively, through which data is written into registers in a 
corresponding register sub-file. Each register sub-file also has at least one read 
port RPo-RPn through which data is read from registers in a corresponding register 
sub-file. 

5 The microprocessor 30 also has multiple clusters CU-CL each of which 

includes a set of functional units. Each register sub-file is associated with 
corresponding one of the clusters. In the microprocessor 30, the clusters CU-CL 
are functionally and/or structurally associated with the register sub-files 
Q RSFo-RSFn, respectively. 

5 10 In this embodiment, a cluster sends data only to a register sub-file 

00 associated with the cluster in a data write operation, while a cluster can read data 

m 

O from any of the register sub-files RSF 0 -RSF n in a data read operation. For 
5 example, when a write instruction is dispatched to cluster CU to be executed by 
£ the functional unit(s) therein, only register(s) in register sub-file RSF 0 associated 
ii 15 with the cluster CU may be accessed to write data therein. Thus, it is not 

necessary for each register sub-file to support write instructions issued from all the 
clusters CU-CL n . Instead, each register sub-file only needs to support write 
instructions from the functional units within a cluster associated with the register 
sub-file. 

20 Referring to Fig. 4, it is assumed for a convenience of the description that a 

microprocessor 40 has two (2) clusters CU, CU and two (2) register sub-files 
RSFo, RSFi. The first and second clusters CU, CU are functionally and 
structurally associated with the first and second register sub-files RSF 0 , RSFi, 
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respectively. The first and second clusters CL 0 , CU each have multiple functional 
units each generating one output result per cycle. The first and second register 
sub-files RSF 0 , RSFi each have multiple physical registers. For example, the first 
register sub-file RSF 0 has physical registers R0-R39, and the second register 
5 sub-file RSF1 has physical registers R40-R79. 

The first and second register sub-files RSF 0 , RSF1 also have write ports 
WPo, WPi, respectively. Thus, the first cluster CU (or functional units in the first 
cluster) accesses the registers in the first register sub-file RSF 0 to write data in the 
registers therein, and the second cluster CL1 (or functional units in the second 
10 cluster) accesses the registers in the second register sub-file RSF1 to write data in 
08 the registers therein. Since each cluster is associated with the corresponding 
0 register sub-file, the microprocessor 40 with register sub-files each having only 
© one write port satisfies the peak data write requirement in a data write operation, 
y In the microprocessor 40, a register-renaming unit 42 is also provided for 

% 15 performing register-renaming process with respect to instructions to be transferred 
to the clusters CL 0 , CU which are then executed by the functional units therein. It 
should be noted that the register-renaming unit 42 may be configured outside the 
microprocessor 40, and that the register-renaming process may be implemented 
by use of software program without any separate hardware structure. 
20 In the register-renaming unit 42, architected registers in an instruction are 

mapped into physical registers in the register sub-files RSF 0 , RSF1. Architected 
registers are used to identify values associated with computation of a 
microprocessor. For example, in instructions "r 3 <- add r 7 , r 9 " and "r 3 <- mul r 3 , r 2 ", 
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register r 3 is an architected register. Register r 3 first contains the result of the 
addition which is then used as an input to the multiply. The result of the multiply is 
then stored in register r 3 . Generally, there are fixed number of architected 
registers for a particular instruction set architecture (ISA). For example, the 

5 PowerPC ISA has thirty-two (32) general purpose architected registers. 

Physical registers in the register sub-files RSF 0 , RSFi are hardware 
realization of the architected registers. For a microprocessor, there can be more 
physical registers than architected registers. Thus, values named by a specific 
architected register may reside in different physical registers. For example, in the 

10 above instructions "r 3 <- add r 7 , r 9 " and "r 3 <- mul r 3 , r 2 ", the result of the addition 
may be placed in physical register R54. Then, when the multiply is executed, the 
physical register R 54 is read to obtain its content and the result of the multiply may 
be placed in physical register R 2 o. 

In a register-renaming process, each architected register is mapped into 

15 corresponding one of the physical registers. In the above example, architected 
register r 3 may be mapped into physical register R54 or R 20 . 

Preferably, in the register-renaming process, target registers in an 
instruction are renamed with physical registers in the register sub-files RSF 0 , 
RSF1. In other words, the renaming is to identify a physical register in a register 

20 sub-file that will be used to store value named by a target register in an instruction. 
A target register is an architected register in an instruction that will be provided 
with a result of the instruction. For example, in the instruction "r 3 <- add r 7 , r 9 ", 
register r 3 is a target register. 



YOR9-2001-0204US1 (8728-500) 11 



Prior to the register-renaming process, it is necessary to determine which of 
the clusters each instruction is dispatched to. Such determination may be 
performed in a instruction dispatch mechanism 44. Once an instruction is 
determined to be dispatched to a particular cluster, target registers in the 
instruction are renamed with physical registers in a register sub-file which is 
functionally associated with the particular cluster. For example, when an 
instruction is determined to be dispatched to the first cluster CL 0 to be executed by 
the functional units therein, target registers of the instruction are renamed with the 
physical registers in the first register sub-file RSFo, i.e., registers R0-R39. 

The microprocessor 40 may also include issue-queue units 46, 48 which 
are functionally associated with the register sub-files RSF 0 , RSF1, respectively. 
The issue-queue units 46, 48 hold the state identifying which of the instructions 
needs to be executed. Thus, in the issue-queue units, register-renamed 
instructions (i.e., instructions after the register-renaming process) are held until 
they are issued to be executed by functional units in an appropriate register 
sub-file. The instruction dispatch mechanism 44 also determines which of the 
issue-queue units each instruction is transferred to. 

In Fig. 5, a flow chart is provided for describing the method of 
register-renaming according to the present invention. In a microprocessor with a 
register file having multiple physical registers, the register file is divided into 
multiple register sub-files (step 51). As a result, each of the register sub-files has 
a predetermined number of physical registers and preferably one write port. The 



YOR9-2001-0204US1 (8728-500) 12 



physical registers may be grouped evenly so that the register sub-files each have 
the same number of physical registers. 

Each of the register sub-files is associated with a particular cluster having 
multiple functional units for executing instructions (step 53). A register sub-file is 
functionally associated with a corresponding cluster so that instructions dispatched 
to the cluster are supported by physical registers in the register sub-file associated 
with the corresponding cluster. Then, it is determined which of the clusters each 
instruction is dispatched to (step 55). Each instruction is dispatched to a selected 
cluster to be executed by functional units in that cluster. 

The register-renaming process is performed with respect to the instructions, 
where architected registers (preferably, target registers) in an instruction are 
renamed with physical registers in the register sub-files (step 57). For example, 
when an instruction is determined to be dispatched to a cluster, target registers in 
the instruction are renamed with physical registers in a register sub-file associated 
with the cluster. 

In consummation of the register-renaming process, each instruction is 
dispatched to a corresponding cluster determined in step 55 (step 59). Thus, the 
instruction is executed by functional units in the cluster. For the execution of the 
instruction, only the physical registers in a register sub-file associated with the 
cluster are accessed to store data from the cluster. 

Having described preferred embodiments of a system and method of 
register-renaming in a microprocessor according to the present invention, 
modifications and variations can be readily made by those skilled in the art in light 
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of the above teachings. It is therefore to be understood that, within the scope of 
the appended claims, the present invention can be practiced in a manner other 
than as specifically described herein. 
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